Improve take performance on List arrays by AdamGS · Pull Request #9643 · apache/arrow-rs

AdamGS · 2026-04-01T12:06:38Z

Which issue does this PR close?

Closes #NNN.

Rationale for this change

This PR builds on top of #9626, improving the results on those benchmarks.

What changes are included in this PR?

Similar to Improve take_bytes perf in the null cases between 10-25% #9625, branch the function into the null and non-null paths
Copy the list elements in a single pass while building the offsets, allocating less intermediate state.

Are these changes tested?

Added a few tests for sliced list arrays.

Are there any user-facing changes?

No

AdamGS · 2026-04-01T12:07:10Z

Results on the benchmarks in #9626:

take list i32 512       time:   [4.4872 µs 4.5048 µs 4.5246 µs]
                        change: [−12.029% −11.670% −11.245%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

Benchmarking take list i32 1024: Collecting 100 samples in estimated 5.0193 s (571k iterattake list i32 1024      time:   [8.1540 µs 8.1715 µs 8.1891 µs]
                        change: [−24.814% −22.002% −19.215%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking take list i32 null values 1024: Collecting 100 samples in estimated 5.0033 s take list i32 null values 1024
                        time:   [5.5799 µs 5.6028 µs 5.6273 µs]
                        change: [−11.178% −4.1193% +8.6975%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

Benchmarking take list i32 null indices 1024: Collecting 100 samples in estimated 5.0048 stake list i32 null indices 1024
                        time:   [7.9070 µs 7.9327 µs 7.9632 µs]
                        change: [−80.594% −80.504% −80.409%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

Benchmarking take list i32 null values null indices 1024: Collecting 100 samples in estimatake list i32 null values null indices 1024
                        time:   [5.3172 µs 5.3387 µs 5.3660 µs]
                        change: [−14.330% −13.956% −13.587%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Signed-off-by: Adam Gutglick <adam@spiraldb.com>

alamb · 2026-04-16T11:34:11Z

run benchmark take_kernels

adriangbot · 2026-04-16T11:35:30Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4259726053-1360-r46bp 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adamg/list-take-perf-improvement (95d54ac) to aac969d (merge-base) diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-16T11:48:53Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                                     adamg_list-take-perf-improvement       main
-----                                                                     --------------------------------       ----
take bool 1024                                                            1.00   1031.9±1.27ns        ? ?/sec    1.00   1032.5±0.71ns        ? ?/sec
take bool 512                                                             1.00    573.0±2.65ns        ? ?/sec    1.00    572.6±0.47ns        ? ?/sec
take bool null indices 1024                                               1.10    851.0±8.27ns        ? ?/sec    1.00   770.8±28.21ns        ? ?/sec
take bool null values 1024                                                1.02      2.1±0.03µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.10  1614.0±14.89ns        ? ?/sec    1.00  1461.9±14.53ns        ? ?/sec
take check bounds i32 1024                                                1.00    657.6±2.35ns        ? ?/sec    1.01    665.6±0.95ns        ? ?/sec
take check bounds i32 512                                                 1.17    457.1±4.85ns        ? ?/sec    1.00    389.5±0.65ns        ? ?/sec
take fsb value len: 12, indices: 1024                                     1.00      2.7±0.16µs        ? ?/sec    1.00      2.7±0.16µs        ? ?/sec
take fsb value len: 12, null values, indices: 1024                        1.00      3.7±0.16µs        ? ?/sec    1.00      3.7±0.16µs        ? ?/sec
take fsb value optimized len: 16, indices: 1024                           1.06    730.3±2.17ns        ? ?/sec    1.00    692.1±4.55ns        ? ?/sec
take fsb value optimized len: 16, null values, indices: 1024              1.02   1786.7±2.58ns        ? ?/sec    1.00   1755.1±3.23ns        ? ?/sec
take i32 1024                                                             1.00    514.0±0.92ns        ? ?/sec    1.02    525.3±0.39ns        ? ?/sec
take i32 512                                                              1.01    355.5±4.15ns        ? ?/sec    1.00    351.7±0.95ns        ? ?/sec
take i32 null indices 1024                                                1.02    872.2±1.27ns        ? ?/sec    1.00    857.3±0.95ns        ? ?/sec
take i32 null values 1024                                                 1.00   1544.9±4.51ns        ? ?/sec    1.01   1564.8±2.53ns        ? ?/sec
take i32 null values null indices 1024                                    1.02   1719.0±4.28ns        ? ?/sec    1.00   1678.0±1.84ns        ? ?/sec
take list i32 1024                                                        1.00      7.8±0.04µs        ? ?/sec    1.65     12.9±0.23µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.03µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.4±0.05µs        ? ?/sec    6.40     60.0±0.26µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.01µs        ? ?/sec    1.35      7.7±0.05µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.06µs        ? ?/sec    1.17      8.0±0.02µs        ? ?/sec
take listview i32 1024                                                    1.00   1333.9±2.40ns        ? ?/sec    1.07   1433.6±4.40ns        ? ?/sec
take listview i32 512                                                     1.00    949.4±2.23ns        ? ?/sec    1.03    977.4±2.34ns        ? ?/sec
take listview i32 null indices 1024                                       1.00   1913.2±3.06ns        ? ?/sec    1.05      2.0±0.00µs        ? ?/sec
take listview i32 null values 1024                                        1.01      2.3±0.00µs        ? ?/sec    1.00      2.3±0.00µs        ? ?/sec
take listview i32 null values null indices 1024                           1.06      2.9±0.00µs        ? ?/sec    1.00      2.7±0.00µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     17.0±0.07µs        ? ?/sec    1.02     17.2±0.07µs        ? ?/sec
take str 1024                                                             1.00      8.4±0.05µs        ? ?/sec    1.00      8.4±0.04µs        ? ?/sec
take str 512                                                              1.02      4.0±0.02µs        ? ?/sec    1.00      3.9±0.02µs        ? ?/sec
take str null indices 1024                                                1.01      5.8±0.02µs        ? ?/sec    1.00      5.7±0.02µs        ? ?/sec
take str null indices 512                                                 1.00      2.7±0.01µs        ? ?/sec    1.00      2.7±0.02µs        ? ?/sec
take str null values 1024                                                 1.00      6.4±0.03µs        ? ?/sec    1.00      6.4±0.02µs        ? ?/sec
take str null values null indices 1024                                    1.00      5.3±0.01µs        ? ?/sec    1.06      5.6±0.01µs        ? ?/sec
take stringview 1024                                                      1.00    909.1±2.66ns        ? ?/sec    1.08    982.6±2.99ns        ? ?/sec
take stringview 512                                                       1.00    557.3±1.51ns        ? ?/sec    1.00    557.6±1.30ns        ? ?/sec
take stringview null indices 1024                                         1.00    914.2±5.87ns        ? ?/sec    1.00    916.1±2.74ns        ? ?/sec
take stringview null indices 512                                          1.01    565.0±1.85ns        ? ?/sec    1.00    560.7±1.12ns        ? ?/sec
take stringview null values 1024                                          1.00   1895.0±1.62ns        ? ?/sec    1.01   1907.8±5.24ns        ? ?/sec
take stringview null values null indices 1024                             1.04   1753.1±1.78ns        ? ?/sec    1.00   1685.2±3.03ns        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	374.3s
Peak memory	1.7 GiB
Avg memory	1.7 GiB
CPU user	373.3s
CPU sys	0.8s
Peak spill	0 B

branch

Metric	Value
Wall time	371.4s
Peak memory	1.7 GiB
Avg memory	1.7 GiB
CPU user	371.1s
CPU sys	0.2s
Peak spill	0 B

File an issue against this benchmark runner

alamb · 2026-04-16T12:17:58Z

group                                                                     adamg_list-take-perf-improvement       main
-----                                                                     --------------------------------       ----
...
take list i32 1024                                                        1.00      7.8±0.04µs        ? ?/sec    1.65     12.9±0.23µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.03µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.4±0.05µs        ? ?/sec    6.40     60.0±0.26µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.01µs        ? ?/sec    1.35      7.7±0.05µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.06µs        ? ?/sec    1.17      8.0±0.02µs        ? ?/sec
...

🚀

alamb

Thank you for this @AdamGS -- I went through it carefully. I have some ideas on potentially ways to make ti faster still but we can perhaps do that as a follow on PR

alamb · 2026-04-16T11:42:44Z

+            }
+        }
+        Some(output_nulls) => {
+            new_offsets.resize(indices.len() + 1, OffsetType::Native::zero());


Why initialize the offsets to zero, when they are immediately overwritten?

You could probably use push and extend instead of fill and setting offsets directly

I assumed that this would be a nice way to avoid unsafe while keeping performance, but locally perf seems to be as good

alamb · 2026-04-16T11:47:44Z

        )
    }

-    #[test]


It would make it easier to see what you have changed if you didn't also move the tests around

no idea why I did that

AdamGS · 2026-04-16T12:23:21Z

I'll address all the comments later today, should be ready by tomorrow

alamb · 2026-04-16T15:57:35Z

run benchmark take_kernels

adriangbot · 2026-04-16T15:59:43Z

🤖 Arrow criterion benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4261494662-1379-xwz99 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing adamg/list-take-perf-improvement (57df7c6) to aac969d (merge-base) diff
BENCH_NAME=take_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench take_kernels
BENCH_FILTER=
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-04-16T16:12:59Z

🤖 Arrow criterion benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

group                                                                     adamg_list-take-perf-improvement       main
-----                                                                     --------------------------------       ----
take bool 1024                                                            1.00   1033.9±0.59ns        ? ?/sec    1.01   1039.3±1.30ns        ? ?/sec
take bool 512                                                             1.00    571.3±0.73ns        ? ?/sec    1.00    571.3±1.09ns        ? ?/sec
take bool null indices 1024                                               1.09    847.8±8.22ns        ? ?/sec    1.00   780.2±29.70ns        ? ?/sec
take bool null values 1024                                                1.00      2.0±0.00µs        ? ?/sec    1.00      2.0±0.00µs        ? ?/sec
take bool null values null indices 1024                                   1.11  1614.7±12.37ns        ? ?/sec    1.00   1452.9±9.55ns        ? ?/sec
take check bounds i32 1024                                                1.00    657.8±2.14ns        ? ?/sec    1.01    666.3±1.04ns        ? ?/sec
take check bounds i32 512                                                 1.16    453.3±0.78ns        ? ?/sec    1.00    390.0±0.82ns        ? ?/sec
take fsb value len: 12, indices: 1024                                     1.00      2.7±0.19µs        ? ?/sec    1.00      2.7±0.19µs        ? ?/sec
take fsb value len: 12, null values, indices: 1024                        1.00      3.8±0.19µs        ? ?/sec    1.00      3.8±0.19µs        ? ?/sec
take fsb value optimized len: 16, indices: 1024                           1.00    688.1±1.40ns        ? ?/sec    1.01    697.5±1.28ns        ? ?/sec
take fsb value optimized len: 16, null values, indices: 1024              1.00   1750.6±1.28ns        ? ?/sec    1.00   1752.2±3.15ns        ? ?/sec
take i32 1024                                                             1.00    516.1±1.68ns        ? ?/sec    1.01    522.5±3.25ns        ? ?/sec
take i32 512                                                              1.00    353.0±0.66ns        ? ?/sec    1.00    354.7±3.29ns        ? ?/sec
take i32 null indices 1024                                                1.02    876.0±3.36ns        ? ?/sec    1.00    858.5±2.00ns        ? ?/sec
take i32 null values 1024                                                 1.11  1715.1±29.14ns        ? ?/sec    1.00   1548.9±6.78ns        ? ?/sec
take i32 null values null indices 1024                                    1.03   1714.7±1.60ns        ? ?/sec    1.00   1664.7±3.52ns        ? ?/sec
take list i32 1024                                                        1.00      7.8±0.03µs        ? ?/sec    1.66     12.9±0.03µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.01µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.7±0.10µs        ? ?/sec    6.28     60.6±0.48µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.02µs        ? ?/sec    1.34      7.7±0.01µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.05µs        ? ?/sec    1.18      8.0±0.01µs        ? ?/sec
take listview i32 1024                                                    1.02   1334.2±2.22ns        ? ?/sec    1.00   1310.8±7.29ns        ? ?/sec
take listview i32 512                                                     1.00    952.1±2.04ns        ? ?/sec    1.14   1084.0±2.29ns        ? ?/sec
take listview i32 null indices 1024                                       1.00   1972.9±3.14ns        ? ?/sec    1.01   1995.7±4.14ns        ? ?/sec
take listview i32 null values 1024                                        1.00      2.3±0.00µs        ? ?/sec    1.03      2.4±0.00µs        ? ?/sec
take listview i32 null values null indices 1024                           1.00      2.8±0.00µs        ? ?/sec    1.00      2.8±0.01µs        ? ?/sec
take primitive run logical len: 1024, physical len: 512, indices: 1024    1.00     17.0±0.07µs        ? ?/sec    1.02     17.3±0.07µs        ? ?/sec
take str 1024                                                             1.02      8.5±0.04µs        ? ?/sec    1.00      8.4±0.04µs        ? ?/sec
take str 512                                                              1.02      4.0±0.02µs        ? ?/sec    1.00      3.9±0.03µs        ? ?/sec
take str null indices 1024                                                1.00      5.7±0.02µs        ? ?/sec    1.00      5.7±0.02µs        ? ?/sec
take str null indices 512                                                 1.05      2.9±0.01µs        ? ?/sec    1.00      2.7±0.01µs        ? ?/sec
take str null values 1024                                                 1.01      6.4±0.03µs        ? ?/sec    1.00      6.4±0.02µs        ? ?/sec
take str null values null indices 1024                                    1.03      5.3±0.01µs        ? ?/sec    1.00      5.2±0.01µs        ? ?/sec
take stringview 1024                                                      1.00    753.4±4.08ns        ? ?/sec    1.31    983.4±1.51ns        ? ?/sec
take stringview 512                                                       1.00    472.8±1.64ns        ? ?/sec    1.18    557.6±1.12ns        ? ?/sec
take stringview null indices 1024                                         1.00    917.0±1.36ns        ? ?/sec    1.00    919.2±1.35ns        ? ?/sec
take stringview null indices 512                                          1.00    561.2±1.12ns        ? ?/sec    1.00    563.7±1.17ns        ? ?/sec
take stringview null values 1024                                          1.00   1802.9±1.35ns        ? ?/sec    1.05   1901.7±3.00ns        ? ?/sec
take stringview null values null indices 1024                             1.03   1732.1±2.31ns        ? ?/sec    1.00   1676.1±5.41ns        ? ?/sec

Resource Usage

base (merge-base)

Metric	Value
Wall time	371.3s
Peak memory	2.1 GiB
Avg memory	2.1 GiB
CPU user	370.4s
CPU sys	0.7s
Peak spill	0 B

branch

Metric	Value
Wall time	376.5s
Peak memory	2.1 GiB
Avg memory	2.1 GiB
CPU user	376.3s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

AdamGS · 2026-04-16T16:18:58Z

take list i32 1024                                                        1.00      7.8±0.03µs        ? ?/sec    1.66     12.9±0.03µs        ? ?/sec
take list i32 512                                                         1.00      4.3±0.01µs        ? ?/sec    1.68      7.2±0.02µs        ? ?/sec
take list i32 null indices 1024                                           1.00      9.7±0.10µs        ? ?/sec    6.28     60.6±0.48µs        ? ?/sec
take list i32 null values 1024                                            1.00      5.7±0.02µs        ? ?/sec    1.34      7.7±0.01µs        ? ?/sec
take list i32 null values null indices 1024                               1.00      6.8±0.05µs        ? ?/sec    1.18      8.0±0.01µs        ? ?/sec

🥳

alamb · 2026-04-16T16:20:37Z

Thanks @AdamGS

github-actions bot added the arrow Changes to the arrow crate label Apr 1, 2026

AdamGS force-pushed the adamg/list-take-perf-improvement branch from 8b7e3be to c66bad6 Compare April 8, 2026 11:48

Improve take on List arrays

95d54ac

Signed-off-by: Adam Gutglick <adam@spiraldb.com>

AdamGS force-pushed the adamg/list-take-perf-improvement branch from c66bad6 to 95d54ac Compare April 8, 2026 11:53

alamb added the performance label Apr 13, 2026

alamb approved these changes Apr 16, 2026

View reviewed changes

Addressing CR comments

57df7c6

alamb approved these changes Apr 16, 2026

View reviewed changes

alamb merged commit 89b1497 into apache:main Apr 16, 2026
26 checks passed

                       )
                   }
-                  #[test]

Conversation

AdamGS commented Apr 1, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

AdamGS commented Apr 1, 2026

Uh oh!

alamb commented Apr 16, 2026

Uh oh!

adriangbot commented Apr 16, 2026

Uh oh!

adriangbot commented Apr 16, 2026

Uh oh!

alamb commented Apr 16, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

AdamGS Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

AdamGS Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AdamGS commented Apr 16, 2026

Uh oh!

alamb commented Apr 16, 2026

Uh oh!

adriangbot commented Apr 16, 2026

Uh oh!

adriangbot commented Apr 16, 2026

Uh oh!

AdamGS commented Apr 16, 2026

Uh oh!

Uh oh!

alamb commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants